Combination of FST and CN search in spoken term detection
نویسندگان
چکیده
Spoken Term Detection (STD) focuses on finding instances of a particular spoken word or phrase in an audio corpus. Most STD systems have a two-step pipeline, ASR followed by search. Two approaches to search are common, Confusion Network (CN) based search and Finite State Transducer (FST) based search. In this paper, we examine combination of these two different search approaches, using the same ASR output. We find that the CN search performs better on shorter queries, and FST search performs better on longer queries. By combining the different search results from the same ASR decoding, we achieve better performance compared to either search approach on its own. We also find that this improvement is additive to the usual combination of decoder results using different modeling techniques.
منابع مشابه
N-gram FST Indexing for Spoken Term Detection
An efficient indexing scheme is essentially important for spoken term detection (STD) on large databases, particularly for phone-based systems that have been widely adopted to achieve vocabulary-independent detection. While the finite state transducer (FST) composition provides a standard indexing approach, the n-gram reverse indexing is more flexible in connectivity representation and confiden...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملSpoken Term Detection Using Multiple Speech Recognizers' Outputs at NTCIR-9 SpokenDoc STD subtask
This paper describes spoken term detection (STD) with false detection control using a phoneme transition network (PTN) derived frommultiple speech recognizers’ outputs at NTCIR9 SpokenDoc STD subtask. Using the output of multiple speech recognizers, the PTN method is effective at correctly detecting out-of-vocabulary (OOV) terms and is robust to certain recognition errors. However, it exhibits ...
متن کاملTUKE at MediaEval 2013 Spoken Web Search Task
This paper provides a rough description of zero resource Query-by-Example retrieving system for the MediaEval 2013 spoken web search task. The proposed solution firstly implements the voice activity detection (VAD) utilizing variance of acceleration MFCC (VAMFCC) rule-based approach. A PCA-based segmentation, K-means clustering and GMM training are then used in order to built the posteriorgrams...
متن کاملSpoken Term Detection Using Phoneme Transition Network from Multiple Speech Recognizers' Outputs
Spoken Term Detection (STD) that considers the out-of-vocabulary (OOV) problem has generated significant interest in the field of spoken document processing. This study describes STD with false detection control using phoneme transition networks (PTNs) derived from the outputs of multiple speech recognizers. PTNs are similar to subword-based confusion networks (CNs), which are originally derive...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014